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Abstract. Geometric optimisation algorithms are developed that efficiently find the nearest low-rank correlation matrix. 
We show, in numerical tests, that our methods compare favourably to the existing methods in the literature. The connection 
with the Lagrange multiplier method is established, along with an identification of whether a local minimum is a global 
minimum. An additional benefit of the geometric approach is that any weighted norm can be applied. The problem of 
finding the nearest low-rank correlation matrix occurs as part of the calibration of multi-factor interest rate market models 
to correlation. 
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1. Introduction. The problem of finding the nearest low-rank correlation matrix occurs in areas 
such as finance, chemistry, physics and image processing. The mathematical formulation of this problem 
is as follows. Let S„ denote the set of real symmetric n x n matrices and let C be a symmetric nxn 
matrix with unit diagonal. For X G §„ we denote hy X ^ that X is positive semidefinite. Let the 
desired rank d € {1, . . . , n} be given. The problem is then given by 

Find X e §„ 

to minimize i||C-X||2 (1.1) 
subject to rank(X) < d; Xu = 1, i = 1, . . . ,n; X ^ 0. 

Here || • || denotes a semi-norm on §„. The most important instance is 

i II C - ^ 1 ^ M/,, {Q, ~ X,,)\ (1.2) 

where is a weights matrix consisting of non-negative elements. In words: Find the low-rank correlation 
matrix X nearest to the given nxn matrix C. The choice of the semi-norm will reflect what is meant by 
nearness of the two matrices. The semi-norm in (1.2) is well known in the literature, and it is called the 
Hadamard semi-norm, see Horn & Johnson (1990). Note that the constraint set is non-convex for d < n, 
which makes it not straightforward to solve Problem (1.1) with standard convex optimization methods. 
For concreteness, consider the following example. Suppose C is 

1.0000 -0.1980 -0.3827 
-0.1980 1.0000 -0.2416 
-0.3827 -0.2416 1.0000 

and W is the full matrix, Wij — 1. With the algorithm developed in this paper, we solve (1.1) with C as 
above and d = 2. The algorithm takes as initial input a matrix X^^^^ of rank 2 or less, for example, 

1.0000 0.9782 0.8982 
= I 0.9782 1.0000 0.9699 
0.8982 0.9699 1.0000 

and then produces a sequence of points on the constraint set that converges to the point 

1.0000 -0.4068 -0.6277 
X* = I -0.4068 1.0000 -0.4559 
-0.6277 -0.4559 1.0000 
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Fig. 1.1. The shell represents the set o/ 3 X 3 correlation matrices of rank 2 or less. The details of this representation 
are given in Section 6.2. 

that solves (1.1). The constraint set and the points generated by the algorithm have been represented 
in Figure 1.1. The details of this representation are given in Section 6.2. The blue point in the center 
and the green point represent, respectively, the target matrix C and the solution point X*. As the figure 
suggests, the algorithm has fast convergence and the constraint set is a curved space. 

This novel technique we propose, is based on geometric optimisation that can locally minimize the 
objective function in (1.1) and which incorporates the Hadamard semi-norm. In fact, our method can be 
applied to any sufficiently smooth objective function. Not all other methods available in the literature 
that aim to solve (1.1) can handle an arbitrary objective function, see the literature review in Section 2. 
We formulate the problem in terms of Riemannian geometry. This approach allows us to use numerical 
methods on manifolds that are numerically stable and efficient, in particular the Riemannian-Newton 
method is applied. We show, for the numerical tests we performed, that the numerical efficiency of 
geometric optimisation compares favourably to the other algorithms available in the literature. The only 
drawback of the practical use of geometric optimisation is that the implementation is rather involved. 
To overcome this drawback, we have made available a MATLAB implementation 'LRCM min' (low-rank 
correlation matrices minimization) at www.few.eur.nl/few/people/pietersz. 

We develop a technique to instantly check whether an obtained local minimum is a global minimum, by 
adaptation of Lagrange multiplier results of Zhang & Wu (2003) . The novelty consists of an expression for 
the Lagrange multipliers given the matrix X, whereas until now only the reverse direction (an expression 
for the matrix X given the Lagrange multipliers) was known. The fact that one may instantly identify 
whether a local minimum is a global minimum is very rare for non-convex optimisation problems, and 
that makes Problem (1.1), which is non-convex for d < n, all the more interesting. 

Problem (1.1) is important in finance, as it occurs as part of the calibration of the multi-factor 
LIBORS market model of Brace, Gatarek & Musiela (1997), Miltersen, Sandmann & Sondermann (1997), 
Jamshidian (1997) and Musiela & Rutkowski (1997). This model is an interest rate derivatives pricing 
model and it is used in some financial institutions for valuation and risk management of their interest rate 
derivatives portfolio. The number of stochastic factors needed for the model to fit to the given correlation 
matrix is equal to the rank of the correlation matrix. This rank can be as high as the number of forward 
LIBORs in the model, i.e., as high as the dimension of the matrix. The number of LIBORs in the model 
can grow large in practical applications, for example, a model with over 80 LIBORs is not uncommon. 
This implies that the number of factors needed to fit the model to the given correlation matrix can be 
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high, too. There is much empirical evidence that the term structure of interest rates is driven by muhiple 
factors (three, four, or even more), see the review article of Dai & Singleton (2003). Though the number 
of factors driving the term structure may be four or more, the empirical work shows that it is certainly 
not as high as, say, 80. This is one reason for using a model with a low number of factors. Another reason 
is the enhanced efficiency when estimating the model-price of an interest rate derivative through Monte 
Carlo simulation. First, a lower factor model simply requires drawing less random numbers than a higher 
factor model. Second, the complexity of calculating LIBOR rates over a single time step in a simulation 
implementation is of order n x d, with n the number of LIBORs and d the number of factors, see Joshi 

(2003) . 

The importance of Problem (1.1) in finance has been recognized by many researchers. In fact, the 
literature review of Section 2 refers to seventeen articles or books addressing the problem. 

Due to its generality our method finds locally optimal points for a variety of other objective functions 
subject to the same constraints. One of the most famous problems comes from physics and is called 
Thomson's problem. The Thomson problem is concerned with minimizing the potential energy of n charged 
particles on the sphere in ISP (d = 3). Geometric optimisation techniques have previously been applied to 
the Thomson problem by Depczynski & Stockier (1998), but these authors have only considered conjugate 
gradient techniques on a 'bigger' manifold, in which the freedom of rotation has not been factored out. 
In comparison, we stress here that our approach considers a lower dimensional manifold, which allows 
for Newton's algorithm (the latter not developed in Depczynski & Stockier (1998)). An implementation 
of geometric optimisation applied to the Thomson problem has also been included in the 'LRCM min' 
package. 

Finally, for a literature review of interest rate models, the reader is referred to Rebonato (2004a). 

The paper is organized as follows. In Section 2, the literature is reviewed. In Section 3, the constraints 
of the problem are formulated in terms of differential geometry. We parameterize the set of correlation 
matrices of rank at most d with a manifold named the Cholesky manifold. This is a canonical space 
for the optimisation of the arbitrary smooth function subject to the same constraints. In Section 4, the 
Riemannian structure of the Cholesky manifold is introduced. Formulas are given for parallel transport, 
geodesies, gradient and Hessian. These are needed for the minimization algorithms, which are made 
explicit. In Section 5, we discuss the convergence of the algorithms. In Section 6, the application of 
the algorithms to the problem of finding the nearest low-rank correlation matrix is worked out in detail. 
In Section 7, we numerically investigate the algorithms in terms of efficiency. Finally, in Section 8, we 
conclude the paper. 

2. Literature review. We mention five algorithms available in the literature for rank reduction of 
correlation matrices, in reverse chronological order. 

First, we mention the majorization approach of Pietersz & Groencn (2004a, b). This reference 
also contains a literature review covering all algorithms except for the alternating projections algorithm 
(discussed below). Majorization can handle an entry- weighted objective function and is guaranteed to 
converge to a stationary point. The rate of convergence is sub-linear. 

Second, we point out the Lagrange multiplier algorithm of Zhang & Wu (2003) and Wu (2003). The 
correlation matrices produced by this algorithm are not guaranteed to converge to a stationary point 
of objective function F in (1.1). This lack of convergence has been pointed out in Pietersz & Groenen 
(20046, Section 2) and therefore it is not necessary to repeat the reasons here. 

Third, we consider the alternating projections algorithm of Grubisic (2002) and Morini & Webber 

(2004) . The discussion below of alternating projections applies only to the problem of rank reduction 
of correlation matrices. The method is based on alternating projections onto the set of n x n matrices 
with unit diagonal and onto the set of n x n matrices of rank d or less. Both these projections can be 
efficiently calculated. For projection onto the intersection of two convex sets, Dykstra (1983) and Han 
(1988) have shown that convergence to a minimum can be obtained with alternating projections onto the 
individual convex sets if a normal vector correction is applied. Their results do not automatically hold for 
an alternating projections algorithm with normal correction for Problem (1.1)^, since for < n the set of 
nx n matrices of rank d or less is non-convex. Indeed, Pietersz & Groenen (20046, Section 2) and Morini 
& Webber (2004) report that alternating projections with normal correction may fail in solving Problem 
(1.1). The alternating projections algorithm without normal correction stated in Grubisic (2002) and 
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Morini & Webber (2004) however always converges to a feasible point, but not necessarily to a stationary 
point. In fact, in general, the alternating projections method without normal correction does not converge 
to a stationary point. The algorithm thus does not minimize the objective function in (1.1), it only selects 
a feasible point satisfying the constraints of (1.1). 

The fourth important contribution is due to Higham (2002). The algorithm of Higham (2002) is the 
alternating projection algorithm with normal correction applied to the case d = n, i.e., to the problem 
of finding the nearest (possibly full-rank) correlation matrix. Note that the space of positive semidefinite 
matrices of rank n is indeed convex, therefore the alternating projections with normal correction method 
is guaranteed to converge to the global minimum. Since the case d < n is of primary interest in this paper, 
the alternating projections method with normal correction will not be considered in the remainder. 

Fifth, we mention the parametrization method of Rebonato (1999a, 19996, 1999c (Section 10), 
2002 (Section 9), 20046 (Sections 20.1-20.4)), Brigo (2002), Brigo & Mercuric (2001, Section 6.9) and 
Rapisarda, Brigo & Mercuric (2002). The set of correlation matrices of rank d or less {YY^ : Y G 
R"^'', diag(yF^)= /} is parameterized by trigonometric functions through spherical coordinates Yi = 
Yi{9i) with 9i e M"^"^. As a result, the objective value F{Y) becomes a function F{Y{6)) of the angle 
parameters 9 that live in M"^(''~i). Subsequently, ordinary non- linear optimisation algorithms may be 
applied to minimize the objective value F{Y{9)) over the angle parameters 9. In essence, this approach is 
the same as geometric optimisation, except for the key difference of optimising over 9 versus over Y. The 
major benefit of geometric optimisation over the parametrization method is as follows. Consider, for ease 
of exposition, the case of equal weights. The differential Fy, in terms of Y, is given simply as 2tpY, with 
tp = YY^ — C, see (6.1) below. Note that Fy = 2ipY can thus be efficiently calculated. The differential 
Fo, in terms of 9 however, is 2'4>Y rmiltiplied by the differential of Y with respcc;t to 9, by the chain rule 
of differentiation. The latter differential is less efficient to calculate since it involves numerous sums of 
trigonometric functions. 

An initial feasible point can be found by a principal component analysis and a suitable re-scaling. 
This method is explained in Section 6.5 below, and is due to Flury (1988). It first appeared in a finance 
setting in Sidenius (2000) and Hull & White (2000). The starting point for the five above mentioned 
algorithms is the re-scaled PCA initial feasible point. 

2.1. Weighted norms. We mention two reasons for assigning non-constant or non-homogeneous 

weights in the objective function of (1.2). First, in our setting C has the interpretation of measured 
correlation. It can thus be the case that we are more confident of specific entries of the matrix C . Second, 
the weighted norm of (1.2) has important applications in finance, see, for example, Higham (2002), 
Rebonato (1999c) and Pietersz & Groenen (2004&). 

The semi-norm in the objective function F can be (i) a Hadamard semi-norm with arbitrary weights 
per element of the matrix, as defined in (1.2), or (ii) a weighted Probenius norm || • \\f,q. with Q. a positive 
definite matrix. Here = tr(Xf7X-^f7). The weighted Probenius norm is, from a practical point 

of view, by far less transparent than the Hadamard or weights-per-entry semi- norm (1.2). The geometric 
optimisation theory developed in this paper, and most of the algorithms mentioned in Section 2, can be 
efficiently applied to both cases. The Lagrange multipliers and alternating projections methods however 
can only be efficiently extended to the case of the weighted Probenius norm. The reason is that both these 
methods need to calculate a projection onto the space of matrices of rank d or less. Such a projection, for 
the weighted Probenius norm, can be efficiently found by an eigenvalue decomposition. Por the Hadamard 
semi-norm, such an efficient solution is not available, to our knowledge, and as also mentioned in Higham 
(2002, page 336). 

3. Solution methodology with geometric optimisation. Note that Problem (1.1) is a special 
case of the following more general problem: 



In this paper methods will be developed to solve Problem (3.1) for the case when F is twice continuously 
differentiable. In the remainder of the paper, we assume d> 1, since for d = 1 the constraint set consists 
of a finite number (2"~^) of points. 



Find X e S, 



to minimize F{X) 
subject to rank(Ar) < d; Xu = 1, i 



(3.1) 



= l,...,n; XhO- 



Efficient Rank Reduction of Correlation Matrices 5 

3.1. Basic idea. The idea for solving Problem (3.1) is to parameterize tlie constraint set by a 
manifold, and subsequently utilize the recently developed algorithms for optimisation over manifolds, 
such as Newton's algorithm and conjugate gradient algorithms. Such geometric optimisation has been 
developed by Smith (1993). 

In Section 3.2, the constraint set is equipped with a topology, and we make an identification with 
a certain quotient space. In Section 3.3, it will be shown that the constraint set as such is not a man- 
ifold; however a dense subset is shown to be a manifold, namely the set of matrices of rank exactly d. 
Subsequently, in Section 3.4, we will define a larger manifold (named Cholesky manifold), of the same 
dimension as the rank-rf manifold, that maps surjectively to the constraint set. We may apply geometric 
optimisation on the Cholesky manifold. The connection between minima on the Cholesky manifold and 
on the constraint set will be established. 

3.2. Topological Structure. In this section, the set of nxn correlation matrices of rank d or less 
is equipped with the subspace topology from §„. We subsequently establish a homeomorphism (i.e., a 

topological isomorphism) between the latter topological space and the quotient space of n products of 
the d—1 sphere over the group of orthogonal transformations of M.'^. Intuitively the correspondence is as 
follows: We can associate with an nxn correlation matrix of rank d a configuration of n points of unit 
length in Mf^ such that the inner product of points i and j is entry of the correlation matrix. Any 

orthogonal rotation of the configuration does not alter the associated correlation matrix. This idea is 
developed more rigorously below. 

Definition 3.1. The set of symmetric nxn correlation matrices of rank at most d is defined by 

Cn,d = { X e S„ ; diag(X) = 7„, rank(X) <d, X ^ O}. 

Here In denotes the nxn identity matrix and diag denotes the map R"^" — > R"^", diag(X)jj = SijXij, 
where Sij denotes the Kronecker delta. 

The set Cum is a subset of S„. The latter space is eqtiipped with the Frobenius norm || • lli;-, which in 
turn defines a topology. We equip Cn,d with the subspace topology. 

In the following, the product of n unit spheres S'^~^ is denoted by Tn^d- Elements of Tn,d are denoted 
as a matrix Y G E"^**, with each row vector Yi of unit length. Denote by Od the group of orthogonal 
transformations of d-space. Elements of Od are denoted by a dxd orthogonal matrix Q. 

Definition 3.2. We define the following right Od-actior? on Tn.d-' 

Tn,d X Od ^ Tn^d, {Y,Q)^YQ. (3.2) 

An equivalence class {YQ : Q G Od} associated with Y G Tn^d is denoted by [Y] and it is called the orbit 
ofY. The quotient space Tn^d/Od is denoted by M^^d- The canonical projection Tn^d Tn^d/Od = Mn^d 
is denoted by w. Define the map'^ ^ as 

Mr,,d-^Cn,d, ^{[Y])=YY^. 
Consider a map $ in the inverse direction of 

Cn,d — ^ Mn,d, 

defined as follows: For X G Cn.d take Y G Tn^d such that YY^ = X . Such Y can always be found as will 
be shown in Theorem 3.3 below. Then set '^{X) = \Y]. It will be shown in Theorem 3.3 that this map is 
well defined. Finally, define the map s : T„^d Cn,d, s{Y) = YY^ . 

The following theorem relates the spaces Cn.d and Mn,d\ the proof has been deferred to Appendix A. 

Theorem 3.3. Consider the following diagram 




Mn,d = Tn,d/Od 



^It is trivially verified that the map thus defined is indeed an 0<j smooth action: YIj^ = Y and Y{QiQ2)~^ = 
(YQ2^)Qi~^ ■ Standard matrix multiplication is smooth. 

■* Although rather obvious, it will be shown in Theorem 3.3 that this map is well defined. 
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with the objects and maps as in Definitions 3.1 and 3.2. We have the following: 

(i) The maps 4* and $ are well defined. 

(ii) The diagram is commutative, i.e., ^ on = s and $ o s = tt. 
(Hi) The map ^ is a homeomorphism with inverse 

3.3. A dense part of M„ ^ equipped with a diflferentiable structure. For an exposition on 
differentiable manifolds, tlie reader is referred to do Carmo (1992). It turns out tliat Mn^d is not a manifold, 
but a so-called stratified space, see, e.g., Duistermaat & Kolk (2000). However there is a subspace of Mn,d 
that is a manifold, which is the manifold of equivalence classes of matrices of exactly rank d. The proof 
of the following proposition has been deferred to Appendix B. 

Proposition 3.4. LetT*^c Tn,d be the subspace defined by 

Kd ^{y&Tn,d : rank(F) =d}. 

Then we have the following: 

1. T* ^ is a sub-manifold ofTn^d- 

2. Denote by M* ^ the quotient space T* ^/Od. Then M* ^ is a manifold of dimension n{d — 1) — 
d{d -l)/2. 

As shown in Proposition 3.4, a subset M* ^ of M„ is a manifold. In the following, we will study 
charts given by sections of the manifold M* ^ that will ultimately lead to the final manifold over which 
will be optimized. 

A section on M* is a map a : U ^ T* ^, with U open in M* such that tt o a = icIm' ^- Such 
a map singles out a unique matrix in each equivalence class. In our case we can explicitly give such 
a map a. Let [Y] in M* ^, and let / denote a subset of {1, . . . ,n} with exactly d elements, such that 
dim(span({yi : i € I})) = d, for Y G [Y]. Note that I is weU defined since any two y(i),y(2) g ^y] are 
coupled by an orthogonal transformation, see the proof of Theorem 3.3, and orthogonal transformations 
preserve independence. The collection of all such I is denoted by Ty ■ It is readily verified that Ty is not 
empty. Let -< denote the lexicographical ordering, then (Xy, ^) is a well-ordered set. Thus wc can choose 
the smallest element, denoted by J{Y) = (ji, . . . ,jd). Define Y G R''-^'^ by taking the rows of Y from Jy, 
thus Yi = Yj.. Define X = YY^ . Since X is positive definite, Cholesky decomposition can be applied to 
X, see for example Golub & van Loan (1996, Theorem 4.2.5), to obtain a unique lower- triangular matrix 
y such that yy^ = X and Yu > 0. By Theorem 3.3, there exists a unique orthogonal matrix Q & Od 
such that y = YQ. Define Y* = YQ. Note that Y* is lower-triangular, since for i ^ Jy, let p be the 
largest integer such that i > jp, then Y* is dependent on Y^ , . . . , Y* , as .Jy is the smallest element from 
Ty, which implies a lower-triangular form for Y* . Then define Uy ^ {[Z] : J{Y) e Tz} C M*^. It is 
obvious that Uy and iT~^iply) are open in the corresponding topologies. Then 

ay - Uy Tl'^iUy), [Z] ^ Z* , (3.4) 

is a section of Uy at Y . The following proposition shows that the sections are the charts of the manifold 
M* ^. The proof has been deferred to Appendix C. 

Proposition 3.5. The differentiable structure on M*^ is the one which makes ay : Uy — > a{Uy) 

into a diffeomorphism. 

3.4. The Cholesky manifold. In this section, we will show that, for the purpose of optimisation, 
it is sufficient to perform the optimisation on a compact manifold that contains one of the sections. For 
simplicity we choose the section ay where J{Y) = {1, . . . , d}. The image ayilAy) is a smooth sub-manifold 
of T„ d with the following representation in M"^'' 

[{Y^ ... Yj f : yi = (1,0,...,0); YeS!^\ i = 2,...,d; Yi e S''-\ i = d+l,...,n}, 

with S'^~^ embedded in R'^ by the first i coordinates such that coordinate i is bigger than and with the 
remaining coordinates set to zero. Also, S"^^^ is similarly embedded in R"^. We can consider the map 
s ■ Tn^d Cn,d restricted to ayiUy), which is differentiable since ay{}Ay) is a sub-manifold of Tn^d- The 
map s|cry(Ui') is a homeomorphism, in virtue of Theorem 3.3. 

For the purpose of optimisation, we need a compact manifold which is surjective with Cn,d- Define 
the following sub-manifold of Tn^d of dimension n{d — 1) — d{d — l)/2, 

Chol„,d = { y e K"^^'^ : yi = (l,0,...,0); Yi & S'-^ , i = 2,...,d; Yi € S'^-\ i = + 1, . . . , n }, 
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which wc caU the Cholcsky manifold. The Cholcsky paramctrization has been considered before by 
Rapisarda et al. (2002), but these authors do not consider non-EucUdean geometric optimisation. The 
map s|choi„,d is surjective, in virtue of the following theorem, the proof of which has been relegated to 
Appendix D. 

Theorem 3.6. If X G Cn,d, then there exists a F e Chol„,rf such that YY'^ = X. 
A function F on Cn,d can be considered on Chol„,d, too, via the composition 

Chol„,d A ^ M, F ^ YY^ ^ F(YY^). 

From here on, we will write F{Y) := F{YY'^) viewed function on Choi,, ,1. 

For a global minimum F{Y) on Choln^d, we have that YY^ attains a global minimum of F on C„^cL-, 
since the map s : Cholrj,d — > Cn,d is surjective. For a local minimum, we have the following theorem. The 
proof has been deferred to Appendix E. 

Theorem 3.7. The point Y attains a local minimum of F on Chol„,d if and only ifYY^ attains a 
local minimum of F on Cn^a- 

These considerations on global and local minima on Choln,d show that, to optimize F over Cn,d, we 
might as well optimize F over the manifold Chol„,d. For the optimisation of F over Cn,d, there is no 
straightforward way to use numerical methods such as Newton and conjugate gradient, since they require 
a notion of differentiability, but for optimisation of F on Chol^^d, we can use such numerical methods. 

3.5. Choice of representation. In principle, we could elect another manifold M and a surjective 
open map M Cn,d- We insist however on explicit knowledge of the geodesies and parallel transport, 
for this is essential to obtaining an efficient algorithm. We found that if we choose the Cholesky manifold 
then convenient expressions for geodesies, etc., arc obtained. Moreover, the Cholesky manifold has the 
minimal dimension, i.e., dim(Chol„,(i) = dim(M*_^). 

In the next section, the geometric optimisation tools are developed for the Cholesky manifold. 

4. Optimisation over the Cholesky manifold. For the development of minimization algorithms 

on a manifold, certain objects of the manifold need to calculated explicitly, such as geodesies, parallel 
transport, etc. In this section, these objects are introduced and made explicit for Chol„,d- 

From a theoretical point of view, it does not matter which coordinates we choose to derive the geo- 
metrical properties of a manifold. For the numerical computations however this choice is essential because 
the simplicity of formulas for the geodesies and parallel transport depends on the chosen coordinates. We 
found that simple expressions are obtained when Chol„,d is viewed as a sub-manifold of Tn^, which, in 
turn, is viewed as a subset of the ambient space M"^''. This representation reveals that, to calculate 
geodesies and parallel transport on Chol„,d, it is sufficient to calculate these on a single sphere. 

The tangent space of the manifold Chol„,d at a point Y e Chol^^d is denoted by TyChol„,d. A tangent 
vector at a point Y is an element of TyGholn^d and is denoted by A. 

4.1. Riemannian structure. We start with a review of basic concepts of Ricmannian geometry. 
Our exposition follows do Carmo (1992). Let M be an m-dimensional differentiable manifold. A Rie- 
mannian structure on M is a smooth map Y 1— > (•, ■)y, which for every Y & M assigns an inner product 
(•, ■)y on TyM, the tangent space at point Y. A Riemannian manifold is a differentiable manifold with 
a Riemannian structure. 

Let F be a smooth function on a Riemannian manifold M. Denote the differential of F at a point 

Y by Fy- Then Fy is a linear functional on TyM. In particular, let j{t), t G {—e,e), be a smooth curve 
on M such that 7(0) = Y and 7(0) expressed in a coordinate chart {U,xi, . . . ,Xm) is equal to A, then 
Fy{A) can be expressed in this coordinate chart by 

1=1 

The linear space of linear functionals on TyM (the dual space) is denoted by {TyM)*. A vector field is 
a map on M that selects a tangent A G TyM at each point Y E M. The Ricmannian structure induces 
an isomorphism between TyM and {TyM)*, which guarantees the existence of a unique vector field on 
M, denoted by gradi^, such that 

Fy{A) = (gradF, X)y for all X e TyM. (4.2) 
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This vector field is called the gradient of F. Also, for Newton and conjugate gradient methods, we have 
to use second order derivatives. In particular, we need to be able to differentiate vector fields. To do this 
on a general manifold, we need to equip the manifold with additional structure, namely the connection. 
A connection on a manifold M is a rule V.- which assigns to each two vector fields Xi,X2 on M a vector 
field VX1X2 on M, satisfying the following two conditions: 

Vfxi+gx.^3 = FVx^Xs + GVx.Xs, VxAFX^) - FVx^{X2) + {X,F)X2, (4.3) 

for F, G smooth functions on M and Xi, X2, X^ vector fields on M . 

Let 7(i) be a smooth curve on M with tangent vector Xi{t) = j{t). A given family X2{t) of tangent 
vectors at the points 7(4) is said to be parallel transported along 7 if 

\7x,X2 = onj{t), (4.4) 

where Xi,X2 are vector fields that coincide with Xi{t) and X2{t), respectively, on 7(t). If the tangent 
vector Xi{t) itself is parallel transported along j{t) then the curve 7(t) is called a geodesic. In particular, 
if {U, xi,. . . , Xm) is a coordinate chart on M and {Xi, . . . , X^} the corresponding vector fields then the 
afEne connection V on W can be expressed by 

m 

Vx,Xi = Y.^liXk. (4.5) 

fc=i 

The functions T^j are smooth functions, called the Christoffel symbols for the connection. In components, 
the geodesic equation becomes 

m 

^k+Yl ^ij'^i^j = 0' (4-6) 

where Xk are the coordinates of j{t). On a Riemannian manifold there is a unique torsion free connection 
compatible with the metric, called the Levi-Civita connection. This means that Christoffel symbols can 
be expressed as functions of a metric on M. Note also that (4.6) implies that, once we have determined 
the equation for the geodesic we can simply read off Christoffel symbols. With respect to an induced 
metric the geodesic is the curve of shortest length between two points on a manifold. For a manifold 
embedded in Euclidean space an equivalent characterization of a geodesic is that the acceleration vector 
at each point along a geodesic is normal to the manifold so long as the curve is traced with uniform speed. 

We start by defining Riemannian structures for Tn,d and for the Cholesky manifold Chol„,d. We 
use the Levi-Civita connection, associated to the metric defined as follows on the tangent spaces. Both 
tangent spaces are identified with suitable subspaces of the ambient space R"^'', and subsequently the 
inner product for two tangents Ai, A2 is defined as 

(Ai,A2)-trAiAf, (4.7) 

which is the Frobenius inner product for n x d matrices. Note that, in our special case, the inner product 
{■,-)y is independent of the point Y; therefore we suppress the dependency on Y. 

4.2. Normal and tangent spaces. An equation determining tangents to T„^d at a point Y can be 

obtained by differentiating diag(ry^) = /„ yielding diag(rA^ + AF^) = 0, i.e., diag(Ay'^) = 0. The 
dimension of the tangent space is n{d— 1). The normal space at the point Y is defined to be the orthogonal 
complement of the tangent space at the point Y, i.e., it consists of the matrices N, for which trAA''-^ = 
for all A in the tangent space. It follows that the normal space is n dimensional. It is straightforward to 
verify that if A'^ = DY, where D is nxn diagonal, then N is in the normal space. Since the dimension of 
the space of such matrices is n, we see that the normal space NyTn^d at F e Tn^d is given by 

NYTn,d = { £>y ; £> e M"^" diagonal }. 

The projections 7rjVyT„,<j and i^TYTn,d onto the normal and tangent spaces of are given by 

7rjv^T„,,(A) = diag(Ay^)y and 7rT^T„,,(A) = A - diag(AF^)y, 

respectively. The projection 7rTyChoi„ ^ onto the tangent space of Chained is given by 



7rTvChoi„.,(A) = C(7rT^T„.,(A) ) with C(A) defined by C(A)y = | 



for j > i or i = j = 1, 
Ajj otherwise. 

(4.6 
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4.3. Geodesies. It is convenient to work with the coordinates of the ambient space M"^'*. In this 
coordinate system, geodesies on T^^d with respect to the Levi-Civita connection obey the second order 
differential equation 

Y + Ty(Y, Y) = 0, with ry(Ai, A2) := diag(AiAj)y. (4.9) 
To see this, we begin with the condition that Y{t) remains on Tn^a, 

diag(yy^) = /„. (4.10) 

Differentiating this equation twice, we obtain, 

diag(yy'^ + 2^^"^ + yy"^) = 0. (4.11) 

In order for Y{-) to be a geodesic, Y{t) must be in the normal space at Y{t), i.e., 

Y{t) = D{t)Y{t) (4.12) 

for some diagonal matrix D{t). To obtain an expression for D, substitute (4.12) into (4.11), which yields 
(4.9). 

The function Ty is the matrix notation of the Christoffel symbols, T^j , with respect to , . . . , End, the 
standard basis vectors of M"^''. More precisely, VE^Ej = Ylkii ^ij^k with T^j defined by (rr(Xi, X2), Ek 

) = ELtir?.(^i)^(^2),. 

The geodesic at Y{0) e Tn,d in the direction A e TY(o)Tn,d is given by, 

Yi{t) = cos( ||A,||i ) Yi{0) + ^ sin ( \\Ai\\t ) A^. (4.13) 

for i = 1, . . . , n, per component on the sphere. By differentiating, we obtain an expression for the evolution 
of the tangent along the geodesic: 

Yi{t) = -IIAill sin ( \\Ai\\t ) Yi{0) + cos ( ||Ai||i ) A^. (4.14) 

Since Choln,d is a Riemannian sub-manifold of Tn^d it has the same geodesies. 

4.4. Parallel transport along a geodesic. We consider this problem per component on the sphere. 
If A^^^ S 7V(i)T„ d is parallel transported along a geodesic starting from Y^^^ in the direction of A*^^^ S 
Ty(i)T„_(i, then decompose A^^^ in terms of A^^^ 

Af\t) = (aP(0), Af (0))AP(t) + i?i±Af )(0). 

Then A^^-'(t) changes according to (4.14) and Ri remains unchanged. Parallel transport from Y^^^ to y^^' 
defines an isometry t{Y'^^\Y^^^) : Ty^T^^d ^ TY(2}Tn^d- When it is clear in between which two points is 
transported, then parallel transport is denoted simply by r. Since Chol^^d is a Riemannian sub-manifold 
of Tn^d it has the same equations for parallel transport. 

4.5. The gradient. Since Chol„,d is a sub-manifold of M"^'' we can use coordinates of M"^*^ to 
express the diffcrcintial Fy of F at the point Y, namely {Fy)ij = -§^- The gradient gradi^ of a function 
F on Chol„,d can be determined by (4.2). It follows that, 

gradi^ = 7rT^choi„,,(i^F) =C{Fy- diag(Fy y^)y ) . (4.15) 

4.6. Hessian. The Hessian HessF of a function is a second covariant derivative of F. More 
precisely, let Ai, A2 be two vector fields, then 

HessF(Ai, A2) = (Vai gradF, A2) 

In local coordinates of M"^"^ 

Hessii^(Ai, A2) = i^v-y(Ai, A2) - (fy, rr(Ai, A2)), (4.16) 
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where 

d 



„ , . . N d d 



with A 



Y = A„ 



y = A2. 

Newton's method requires inverting the Hessian at minus the gradient, therefore we need to find the 
tangent A to Chol„,d such that 

HessF(A, X) = {-giSidF, X), for all tangents X to Chol„,d. (4.17) 

To solve (4.17), it is convenient to calculate the unique tangent vector H = W(A) satisfying 

HessF(A,X) = {n,X), for all tangents X to Chol„,d, 

since then the Newton Equation (4.17) becomes W(A) = — gradF. Prom (4.9) and (4.16), we obtain 

W(A) = 7rT^choU,,(Fyy(A)) - dia.g{FYY^)A, (4.18) 

where the notation Fyy{A) means the tangent vector satisfying 

d 



FY{Y{t)), Y{0) = A. 

=0 

4.7. Algorithms. We are now in a position to state the conjugate gradient algorithm, given as Al- 
gorithm 1, and the Newton algorithm, given as Algorithm 2, for optimisation over the Cholesky manifold. 
These algorithms are instances of the geometric programs presented in Smith (1993), for the particular 
case of the Cholesky manifold. 

5. Discussion of convergence properties. In this section, we discuss convergence properties of 
the geometric programs: global convergence and the local rate of convergence. 

5.1. Global convergence. First, we discuss global convergence for the Riemannian-Newton algo- 
rithm. It is well known that the Newton algorithm, as displayed in Algorithm 2, is not globally convergent 
to a local minimum. Moreover, the steps in Algorithm 2 may even not be well defined, because the Hessian 
mapping could be singular. The standard way to resolve these issues, is to introduce jointly a steepest 
descent algorithm. So Algorithm 2 is adjusted in the following way. When the new search direction A'^'') 
has been calculated, then wc also consider the steepest descent search direction Ag^),pp = — gradF(y('^'). 

Subsequently, a line minimization of the objective value is performed in both directions, A^'^^ and Ag^^^p. 
We then take as the next point of the algorithm whichever search direction finds the point with lowest 
objective value. Such a steepest descent method with line minimization is well known to have guaranteed 
convergence to a local minimum. 

Second, we discuss global convergence for conjugate gradient algorithms. For the Riemannian case, 
we have not seen any global convergence results for conjugate gradient algorithms in the literature. 
Therefore we focus on the results obtained for the fiat-Euclidean case. Zoutendijk (1970) and Al-Baali 
(1985) establish global convergence of the Fletcher & Reeves (1964) conjugate gradient method with line 
minimization. Gilbert & Nocedal (1992) establish alternative line search minimizations that guarantee 
global convergence of the Polak & Ribicrc (1969) conjugate gradient method. 

5.2. Local rate of convergence. Local rates of convergence for geometric optimisation algorithms 
are established in Smith (1993), Edelman, Arias & Smith (1998) and Dedieu, Priouret & Malajovich 
(2003). 

In Theorem 3.3 of Smith (1993), the following result is established for the Riemannian-Ncwton 
method. If y is a non-dcgencratc stationary point, then there exists an open set U containing Y, such 
that starting from any y(°) in U, the sequence of points produced by Algorithm 2 converges quadratically 

to y. 

In Theorem 4.3 of Smith (1993), the following result is stated for the Riemannian Fletcher & Reeves 
(1964) and Polak & Ribicre (1969) conjugate gradient methods. Suppose y is a non-degenerate stationary 
point such that the Hessian at Y is positive definite. Suppose {y^-'^j^o is a sequence of points, generated 
by Algorithm 1, converging to Y. Then, for suSiciently large j, the sequence {y^-'^j^Q has dim(Chol„,d)- 

steps quadratic convergence to Y. 

As a numerical illustration, convergence runs have been displayed in Figure 5.1, for reducing a 10 x 10 
correlation matrix to rank 3. The following algorithms are compared: 
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Algorithm 1 Conjugate gradient for minimizing F{Y) on Chol„,d 



Input: Y^°\F{-). 

Require: Y^°^ such that y(")(y("))3^ = /„. 

Compute G(o) = gradi^(y(o)) = C{Fy - diag(Fr (F(°))'^)r(o)) and set H^°^ = -G^. 
for fc = 0,l,2, ... do 

Minimize over t where Y^'^^ {t) is a geodesic on Chol„,d starting from Y^'^^ in the direction 

of ijW. 

Set tk = tmin and y^'^+i) = Y^^Htk). 

Compute Gf'^+i) = gradi^(y('=+i)) ^ ({ Fy - diag(i^Y(y('=+i))'^)y('=+i) ). 
Parallel transport tangent vectors iJ^*^^ and G^''^ to the point F^'^+i). 
Compute the new search direction 



^(fe+i) ^ _Q{k+i) ^i^^H(k) , 



7fc = ^ {GO'?,G('^?) Polak & Ribiere (1969), 



7fc 



Fletcher & Reeves (1964). 



Reset = -GC^+i) if A: + 1 = mod n{d - 1) - id((i - 1). 

end for 



Algorithm 2 Newton's method for minimizing F(Y) on Chol„,d. 
Input: Y^°\F{-). 

Require: y(o) such that diag(y(o)(r(o))^) = /„. 
for fc = 0, 1,2, ... do 

Compute G(*=) = gradF(y('=)) ^ ({ Fy ~ diag(Fyy'^)r ). 
Compute A('=) = -H'^G^''^ i.e. AC'^) e TyChol„,d and 

C( Fyy{A^>'^) - diag ( Fyy{A('''>){Y(''Y )y('=) ) - diag(i?^y (y('=))^)A('=) = -GC^). 

Move from y^*^^ in direction A^*^^ to y('^)(l) along the geodesic. 
Set =fW(i). 
end for 



1. Steepest descent, for which the search direction i/C^+i) in Algorithm 1 is equal to —G^''~^^\ i.e., 
to minus the gradient. The steepest descent method has a linear local rate of convergence, see 
Smith (1993, Theorem 2.3). 

2. PRCG, Polak- Ribiere conjugate gradient. 

3. FRCG, Fletcher-Reeves conjugate gradient. 

4. Newton. 

5. Lev. -Mar., the Levenberg (1944) & Marquardt (1963) method, which is a Newton-type method. 
The code that is used for this test is the package 'LRCM min', to be discussed in Section 7. This package 
also contains the correlation matrix used for the convergence run test. Figure 5.1 clearly illustrates the 
convergence properties of the various geometric programs. The efhciency of the algorithms is studied in 
Section 7 below. 

6. A special case: Distance minimization. In this section, the primary concern of this paper to 
minimize the objective function of (1.2) is studied. The outline of this section is as follows. First, some 
particular choices for n and d are examined. Second, the differential and Hessian of F are calculated. 
Third, the connection with Lagrange multipliers is stated; in particular, this will lead to an identification 
method of whether a local minimum is a global minimum. Fourth, we discuss the PCA with re-scaling 
method for obtaining an initial feasible point. 

6.1. The case of d = n. The case that G is a symmetric matrix and the closest positive semidefinite 
matrix X is to be found allows a successive projection solution, which was shown by Higham (2002). 
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20 40 60 80 100 120 140 160 180 200 

Iterate (i) 



Fig. 5.1. Convergence runs for various geometric programs: log relative residual Zn(||gradF(y(*')||/||grad_F(y(''^)||) 
versus the iterate i. 

6.2. The case of d = 2, n = 3. A 3 x 3 symmetric matrix with ones on the diagonals is denoted by 

/ I X y \ 
\ X 1 ^ • 
\V z I I 

Its determinant is given by 

det = -{x^ +y'^ + ] + 2xyz + 1. 

By straightforward calculations it can be shown that det = implies that all eigenvalues are nonnegative. 
The set of 3 by 3 correlation matrices of rank 2 may thus be represented by the set {det = 0}. To get an 
intuitive understanding of the complexity of the problem, the feasible region has been displayed in Figure 
1.1. 

6.3. Formula for the differential of F. Consider the specific case of the weighted Hadamard semi- 
norm of (1.2). This semi-norm can be represented by a Frobenius norm by introducing the Hadamard 
product o. The Hadamard product denotes entry-by-entry multiplication. Formally, for two matrices A 
and B of equal dimensions, the Hadamard product Ao B is defined by {A o B)ij — AijBij. The objective 
function (1.2) can then be written as 

i<j 

with V := YY^ - C and with {W°^^^)ij = Then 

^F(Y{t)) = (W^°i/2 o ^, W^°i/2 o V) = (V-, o V) 
at 

= {AY'^, Wo^) + {YA^, Woi;) 
- (A,2(M^o^)y) = (A,Fy), VA. 

Thus from (4.1) we have 



Fy = 2{W o ^Ij)Y. 



(6.1) 
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Similarly, we may compute the second derivative 




with Y(-) any curve starting from Y in the direction of A. 

6.4. Connection normal with Lagrange multipliers. The following lemma provides the basis 
for the connection of the normal vector at Y versus the Lagrange multipliers of the algorithm of Zhang 
& Wu (2003) and Wu (2003). The result is novel since previously only an expression was known for 
the matrix Y given the Lagrange multipliers. The result below establishes the reverse direction. This 
Lagrange result will allow us to identify whether a local minimum is also a global minimum. That we are 
able to efBciently determine whether a local minimum is a global minimum, is a very rare phenomenon 
in non-convex optimisation, and makes the rank reduction problem (non-convex for d < n) all the more 
interesting. 

Note that the Lagrange theory is based on an efficient expression of the low-rank projection by an 
eigenvalue decomposition. Therefore the theory below can be extended efficiently only for the Hadamard 
norm with equal weights and for the weighted Probenius norm, see also the discussion in Section 2.1. The 
proof of the following lemma has been deferred to Appendix F. 



Lemma 6.1. Let Y e Tn^d be such that gradi^(y) = 0. Here, gradF is the gradient of F on Tn^d, 
gradF(y) = ttt^t^^APy) = Fy - diag{FYY'^)Y , with Fy in (6.1). Define 



where D* can be obtained by selecting at most d nonnegative entries from D (here if an entry is selected 
it retains the corresponding position in the matrix). 

The characterization of the global minimum for Problem (1.1) was first achieved in Zhang & Wu (2003) 
and Wu (2003), which we repeat here: Denote by {X}d a matrix obtained by eigenvalue decomposition 
of X together with leaving in only the d largest eigenvalues (in absolute value). Denote for A e M": 
C(A) = C + diag(A). The proof of the following theorem has been repeated for clarity in Appendix G. 

Theorem 6.2. (Characterization of the global minimum of Problem (1.1), see Zhang & Wu (2003) 
and Wu (2003)) Let C be a symmetric matrix. Let A* be such that there exists {C + diag(A*)}d G Cn,d 
with 



Then {C + diag(A*)}£i is a global minimizer of Problem (l.l). 

This brings us in a position to identify whether a local minimum is a global minimum: 

Theorem 6.3. Let Y e r„,d be such that gradi^(y) = on Tn^. Let A and C(A) be defined as in 

Lemma 6.1. IfYY'^ has the d largest eigenvalues from C(A) (in absolute value) then YY^ is a global 

minimizer to the Problem (l-l). 

Proof: Apply Lemma 6.1 and Theorem 6.2. □ 

6.5. Initial feasible point. To obtain an initial feasible point Y e T„^d we use a method of Flury 
(1988). We first perform an eigenvalue decomposition 



A := i diag ( FyY^ ) 



and define C(A) := C + A. Then there exist a joint eigenvalue decomposition 



C{X) = QDQ^, YY^ = QD*Q'^ 



diag ( {C + diag(A*)}d ) = diag(C). 



(6.2) 



C = QAQ^, All > • • • > A, 



(6.3) 



Then we define Y by assigning to each row 



Yi = 



— ,Z={QdAl/^){i,:), i = l,...,n, 



where Qd consists of the first d columns of Q and where Ad+ is the principal sub-matrix of A of degree 
d, filled only with the non-negative elements from A. The scaling is to ensure that each row of Y is of 
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Table 7.1 

Excerpt of Table 3 in De Jong et al. (2004). 





71 


72 


73 


74 


estimate 


0.000 


0.480 


1.511 


0.186 


standard error 




0.099 


0.289 


0.127 



unit length. If row z is a priori of zero length, then we choose Yi to be an arbitrary vector in R"^ of unit 
length. Finally, to obtain an initial feasible point in Chol„.d, we perform a Cholesky decomposition as in 
the proof of Theorem 3.6. 

Note that the condition of decreasing norm in (6.3) is thus key to ensure that the initial point is close 
to the global minimum, see the result of Theorem 6.3. 

7. Numerical results. There are many different algorithms available in the literature, as detailed 

in Section 2. Some of these have an efficient implementation, i.e., the cost of a single iteration is low. Some 
algorithms have fast convergence, for example, the Newton method has quadratic convergence. Algorithms 
with fast convergence usually require less iterations to attain a predefined convergence criterion. Thus, the 
real-world performance of an algorithm is a tradc-off between cost-pcr-iterate and number of iterations 
required. A priori, it is not clear which algorithm will perform best. Therefore, in this section, the 
numerical performance of geometric optimisation is compared to other methods available in the literature. 

7.1. Acknowledgement. Our implementation of geometric optimisation over low-rank correlation 

matrices 'LRCM min'^ is an adoption of the 'SG min' template of Edclman & Lippert (2000) (written in 
MATLAB) for optimisation over the Stiefel and Grassmann manifolds. This template contains four dis- 
tinct well-known non-linear optimisation algorithms adapted for geometric optimisation over Riemannian 

manifolds: Newton algorithm; dogleg step or Lcvenberg (1944) and Marquardt (1963) algorithm; Polak 
& Ribierc (1969) conjugate gradient; and Fletcher & Reeves (1964) conjugate gradient. 

7.2. Numerical comparison. The performances of the following seven algorithms, all of these 
described in Sections 4 and 2, except for item 7 (fmincon), are compared: 

1. Geometric optimisation, Newton (Newton). 

2. Geometric optimisation, Fletcher- Reeves conjugate gradient (FRCG). 

3. Majorization, e.g., Pietersz & Groenen (20046) (Major.). 

4. Parametrization, e.g., Rebonato (1999&) (Param.). 

5. Alternating projections without normal vector correction, e.g., Grubisic (2002) (Alt. Proj.). 

6. Lagrange multipliers, e.g., Zhang & Wu (2003) (Lagrange). 

7. fmincon, a MATLAB built-in medium-scale constrained nonlinear program (fmincon). 

Note that the first two algorithms in this list have been developed in this paper. The algorithms are tested 
on a large number of randomly generated correlation matrices. The benefit of testing on many correlation 
matrices is, that the overall and generic performance of the algorithms may be assessed. The correlation 
matrices are randomly generated as follows. A parametric form for (primarily interest rate) correlation 
matrices is posed in De Jong, Driessen & Pelsser (2004, Equation (8)). We repeat the parametric form 
here for completeness. 

p(T.,T,)=exp{ -7im-T,| - ^g^L -7i|x/^-x/^| }, 

with 71,72,74 > and with Ti denoting the expiry time of rate i. (Our particular choice is Ti = i, 
i = 1,2,...) This model was then subsequently estimated with USD historical interest rate data. In 
Table 3 of De Jong et al. (2004) the estimated 7-parameters are listed, along with their standard error. 
An excerpt of this table has been displayed in Table 7.1. The random correlation matrix that we use is 
obtained by randomizing the 7-parameters. We assume the 7-parameters distributed normally with mean 
and standard errors given by Table 7.1, with 71,72,74 capped at zero. 

As the benchmark criterion for the performance of an algorithm, we take its obtained accuracy of 
fit given a fixed amount of computational time. Such a criterion corresponds to financial practice, since 
decisions based on derivative valuation calculations often need to be made within seconds. To display the 



^LRCM min can be downloaded from www.few.eur.nl/feM/people/pietersz. 
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Fig. 7.1. Performance profile with n = 30, ci = 3, 2 seconds of computational time, Hadamard norm with equal 
weights. A rule of thumb is, that the higher the graph of an algorithm, the better its performance. 



comparison results, we use the state-of-the-art and convenient performance profiles] see Dolan & More 
(2002). The reader is referred there for details, but the idea is briefly described here. There are 100 test 
correlation matrices p — 1, . . . , 100, and seven algorithms s = 1, . . . , 7. As performance measure we take 
the obtained function value F'-P'^^ of algorithm s on problem p given the limited computational time. The 
performance ratio p^P''*) is defined to be the ratio of the performance measure of the algorithm over the 
best obtained performance measure for all seven algorithms, 

_F(p,s) 



iin{F(P''') : s = 1, . . . , 7} ■ 

The cumulative distribution function O'-''-' of the performance ratio for algorithm s, viewed as a random 
variable p — > p^P''^\ is then the performance profile of algorithm s, 

^^"HO = : P'^"''^ <tp=l, 100}. 

A rule of thumb is, that the higher the profile of an algorithm, the better its performance. The performance 
profiles have been displayed in Figures 7.1-7.4, for various choices of n, d, and computational times. Each 
performance profile represents a benchmark on 100 different test interest rate correlation matrices. For 
Figures 7.1-7.3, an objective function with equal weights is used. For Figure 7.4, we use a Hadamard 
semi-norm with non-constant weights. These weights are chosen so as to reflect the importance of the 
correlation entries for a specific trigger swap, as outlined in, e.g., Rebonato (20046, Section 20.4.3). For 
this specific trigger swap, the first three rows and columns arc important. Therefore the weights matrix 
W takes the form 



f 1 if i < 3 or j < 3, 
y otherwise. 



From Figures 7.1-7.4 it becomes clear that geometric optimisation compares favourably to the other 
methods available in the literature, with respect to obtaining the best fit to the original correlation matrix 
within a limited computational time. 
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Performance ratio ^ 

Fig. 7.2. Performance profile with n = 50, d = A, 1 second of computational time, Hadamard norm with equal weights. 

8. Conclusions. We applied geometric optimisation tools for finding the nearest low-rank correla- 
tion matrix. The differential geometric machinery provided us with an algorithm more efficient than any 
existing algorithm in the literature, at least for the numerical cases considered. The geometric approach 
also allows for insight and more intuition into the problem. We established a technique that allows one 
to straightforwardly identify whether a local minimum is a global minimum. 
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Appendix A. The proof of Theorem 3.3. 

Proof of (i). The maps \E' and $ are well defined: To show that \1/ is well defined, we need to show 
that if Y2 e [Yi], then Y2Y^ = Y^Y^ . Prom the assumption, we have that e Od : F2 = YxQ. If 
follows that 

Fsif = (Fig)(yiQf = Y^QQ^Y^ = Y,Y^, 

which was to be shown. 

To show that $ is well defined, wc need to show: 

(A) If X e Cn,d then there exists Y e T„,d such that X = YY^ . 

(B) If y, Z e Tn,d, with = ZZ^ =-. X then there exists Q G Od such that Y = ZQ. 
Ad (A): Let 

X = QAQ^, QGOn, A = diag(A), 

be an eigenvalue decomposition with A^^ = for z = + 1, . . . , n. Note that such a decomposition of the 
specified form is possible because of the restriction X £ Cn,d- Then note that 

QVA={ {Qy/A){:,l : d) | ). 

Thus if we set Y = (QVA)(:, 1 : d) then YY'^ = X and Y e Tn.d, which was to be shown. 

Ad (B): Let rank(y) = rank(.^) = rank(X) = k < d. Without loss of generality, we may assume 
that the first k rows of Y and Z are independent. We extend the set of k row vectors {yi, . . . , yfc} to 
a set of d row vectors {Yi, . . . , y^, y^+i, . . . , Yd}, such that the latter forms a basis of R''. Similarly, we 
obtain a basis {Zi, . . . , Zk, Zk+i, ■ ■ ■ , Zd} of M.'^. It follows that there exists an orthogonal rotation Q, 
QQ"^ = such that QYi = Zi {i = 1, . . . ,k), QYi = Zi {i = k + 1, . . . ,d). Note that then also QYi = Zi 
ioi i = k + 1 , . . . , n, hy linearity of Q and since the last n — k row vectors are linearly dependent on the 
first k row vectors by assumption. It follows that YQ = Z, which was to be shown. □ 

Proof (a). Diagram (3.3) is commutative: To show that ^ o tt = s: Let Y € Tn^d, then n{Y) = [Y] 
and «'([y]) = yy^ and also s{Y) = YY'^ . To show that $ o s = tt: Let y e T„,d, then s{Y) = YY'^ and 
$(yy^) = [Y] and also 7r(y) = [Y]. ' □ 

Proof of (Hi). The map ^ is a homeomorphism with inverse It is straightforward to verify 
that <I> o and o <I> are both the identity maps. The map ^ is thus bijective with inverse To 
show that ^' is continuous, note that for quotient spaces we have: The map ^' is continuous if and only 
if o TT is continuous (see for example Abraham, Marsden & Ratiu (1988), Proposition 1.4.8). In our 
case, o TT = s with s{Y) = YY^ is continuous. The proof now follows from a well-known lemma from 
topology: A continuous bijection from a compact space into a Hausdorff space is a homeomorphism (see 
for example Munkres (1975), Theorem 5.6). □ 

Appendix B. Proof of Proposition 3.4. 

1. It is sufficient to show that {Y e M"^^ : rank(y) = d} is open in R"^'^, since T* ^ is open in Tn,d 
if and only if {Y £ R"^'^ : rank(y) = d} is open in M"^''. Since the rank of a symmetric matrix 
is a locally constant function, it follows that {Y G K"^"^ : rank(y) = d} = s~^(rank~^(d)) is an 
open subset of R"^'', with s(Y) = YY'^ as in Definition 3.2. □ 

2. This part is a corollary of Theorem 1.11.4 of Duistermaat & Kolk (2000). This theorem essentially 
states that for a smooth action of a Lie group on a manifold the quotient is a manifold if the 
action is proper and free. First, we show that the action of Od on T* ^ is proper^. Let 

<i> : Tld X Od - Tld X Kd, {Y, Q) - {YQ, Y) 

and K a compact subset of T* ^ x T* ^. We have to show that (l>~^{K) is compact. By continuity 
of (/), 4>~'^{K) is closed in T*^ x O^. Because T*^xOd is bounded it follows that (f>~^{K) is 
compact. 

Second, we show that the Od-action on T* ^ is free. Let y e T* ^ and Q £ Od such that YQ = Y. 
Since rank(y) = d, it follows from the proof of Theorem 3.3 (i) that there exists precisely one 
Q £ Od such that YQ = Y. Thus, this Q must be the identity matrix. 

The dimension of M*^^ = dim(T*^) - dim(Od) = n{d - 1) - i(i(<i - 1). □ 



®For a definition, see Duistermaat & Kolk (2000, page 53). 
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Appendix C. Proof of Proposition 3.5. 

This part is a corollary of Theorem 1.11.4 of Duistermaat & Kolk (2000). This theorem states that 
there is only one differentiable structure on the orbit space which satisfies the following: Suppose that, 
for every [Y] G M* ^, we have an open neighbourhood U C M* ^ and a bijective map: 

T:T:-\U)^UxOa, Y ^ {'k{Y),x{Y)), 

such that, for every Y e 7r~^(Zi), Q G O^, riYQ) = {■k{Y),x{Y)Q)- The differentiable structure on M*^ 
is the one which makes r into a diffeomorphism. The topology of M* ^ obtained in this manner is equal 
to the quotient topology. 

Let Y £ T*j and ay be a section over Uy defined in (3.4). We define Ty : 7r~^{Uy) UyxOd as 
follows. For Z e TT^^dZ]), [Z] e Uy, there is a unique element Qz & Od such that Z = ay{[Z])Qz- 
Then we define Ty by Ty{Z) = {[Z],Qz)- By definition, we have that Ty^{[Z],Q) = ay{[Z])Q. Since 
Ty^ : n~^{Uy) UyxOd is a bijective map, we have that Ty is bijective, too. It can be easily verified that 
Ty satisfies the condition Ty{YQ) — {[Y],QyQ) of Theorem 1.11.4 of Duistermaat & Kolk (2000) stated 
above. It follows that Ty is a diffeomorphism if and only if ay : U ^ ay{U) is a diffeomorphism. Thus, 
the differentiable structure on M* ^ is the one which makes ay : Uy a{Uy) into a diffeomorphism. □ 

Appendix D. Proof of Theorem 3.6. 

Let X G Cn,d and suppose that rank(X) = k < d. Then there is a F G Tn,k such that YY^ = X, 
by Theorem 3.3. Apply to Y the procedure^ outlined in Section 3.3, to obtain a lower-triangular matrix 
y* € Tn,k, such that Y*{Y*)'^ = X. A lower-triangular matrix Y G Chol„,d that satisfies YY'^ = X can 
now easily be obtained by setting 



Y 



Y* ^ 

nx{d—k) 

which was to be shown. □ 
Appendix E. Proof of Theorem 3.7. 

First, we prove the 'only if part. Note that it is sufficient to show that the map s : Chol„,d Cn,d 
is open. For then if Y attains a local minimum of F on the open neighbourhood U C Chol„,d, then 
s{Y) = YY^ attains a local minimum of F on the open neighbourhood s{U) of YY^ , since for any 
X' = Y'Y''^ G s{U), F{X') = F(Y'Y''^) = F{Y') > F{Y) = F{YY^). 

To show that s : Chol„^d — > Cn,d is open, note that it is sufficient to show that tt : Chol„,d — > Mn,d is 
open, since Vl/ : M„ ^ Cn.d is a homeomorphism (see Proposition 3.4, item 3) and s = ^ o n. 

Suppose, then, that U is open in Chol„ d- We have to show that 7r~^(7r(Z//)) is open in T^^d^ by 
definition of the quotient topology of Mn^- We have 

n-\ Tr{U) ) = {YQ : Y eU, QeOd}. 

It is sufficient to show that the complement {n^^{TT{hl))y is closed. Let {F^*^} be a sequence in 
(TT-'^(Tr(JJ))y converging to Y, i.e. lim^^oo \\Y^'^ - Y\\ = 0. We can write F« = Z«Q(^) with Z^''> G 
and Q(') G Od- Then, 

lim ||yW - Y\\ ^ lim - Y\\ = lim - Y{Q'^'^f\\ = 0. (E.l) 

Since x Od is compact, there exists a converg ent subsequence {(Z(*^), Q('^))}, with Z'-'^^ -> Z* & W 
and Q(^^) ^ Q*, say. From (E.l) it follows that Z* = r(Q*)^ G which implies Y G {TT-\Tr(U)))'' . 
The reverse direction is obvious since the map s : Chol„,d — > Cn,d is continuous. □ 

Appendix F. Proof of Lemma 6.1. 

It is recalled from matrix analysis that Xi and X2 admit a joint eigenvalue decomposition if and only 
if their Lie bracket [Xi, X2] = X1X2 — X2X1 equals zero. Define C{X) := —ip + A. Note that 2AF is the 
projection 7rjVyT„,d(-FV) of Fy onto the normal space at Y. Note also that 

YY^ + C{X) = C{X). (F.l) 



^The procedure in Section 3.3 is stated in terms of d, but k should be read there in this case. 
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We calculate 

C{\)Y = {-^ + x)Y = -\Fy + \-KN.T^,AFy ) = 0. (F.2) 

The last equality follows from the assumption that gradi^(F) = 0, i.e. the differential Fy is normal at 
Y. (Hero, gradi^(y) denotes the gradient on Tn,d-) It follows from (F.2) and from the symmetry of C{\) 
that 

(i) YY^CiX) = and also, 

(ii) [yr^,(7(A)]=o. 

From (ii), YY^ and C{\) admit a joint eigenvalue decomposition, but then also jointly with C(A) because 
of (F.l). Suppose C{X) = QDQ^. From (i) we then have that D^^ and Da cannot both be non-zero. The 
result now follows since YY"^ is positive semidefinite and has rank less than or equal to d. □ 

Appendix G. Proof of Theorem 6.2. 

Define the Lagrangian 

C{X,X) :=-||C-X|||,-2A'^diag(C-X), and 



V{X) := min { C{X, A) : rank(X) = d}. (G.l) 

Note that the minimization problem in Equation (G.l) is attained by any {C{X)}d (see e.g.. Equation 
(30) of Wu (2003)). For any X e C„,d, 

\\C-X\\% ~C{X,X*) > -^A*) i \\C-{C{X)}d\\l. 

(This is the equation at the end of the proof of Theorem 4.4 of Zhang & Wu (2003).) Here (in-)equality 

(a) is obtained from the property that X S Cn,d, 

(b) is by definition of V, and 

(c) is by assumption of (6.2). □ 



